1 Introduction

In this part of my project I will refine my research questions. I will further examine the effects of the pandemic on recent MCPS highschool graduates enrolled at Montgomery College. For the purposes of this study I will limit my dataset to MCPS students under the age of 20. These MCPS students will be divided further into subgroups based on Gender and Race. The datasets used in this part of my project have already been cleaned in my initial data analysis. Outliers have not been removed. I will conduct my statistical analysis with and without the outliers.

2 Data Dictionary

For the purposes of this Project the following variables and definitions are important.

The population in this dataset is the incoming cohort of students in Fall of 2019 and 2020. These students are first time degree or certificate seekers and have no prior tertiary education. They may have earned AP credits in highschool.

Fall2019 refers to the incoming freshman cohort in Fall2019. This is term year 2020.
Fall2020 refers to the incoming freshman cohort in Fall2020. This is term year 2021.

Variables of Interest: term year Incoming students in Fall2019 are assigned to term year 2020. Incoming students in Fall 2020 are assigned to term year 2021.
hours_earned: refers to credit hours the student has earned in their first Fall semester ( this can include credits earned in Summer school second session- Summer 1 and AP credits earned in high school).
hours_attempted: refers to credit and non credit hours the student has attempted in their first Fall semester ( this may include credits attempted in Summerschool second session - Summer 1).
full_part: is the student full-time (FT) or part-time (PT). Part time students are registered in less than 12 credit hours. Full-time students take at least 12 credits. major: degree programme student is registered for or certificate&LR ( letter of recommendation.) All certificates and letters of recommendations have been grouped together.
hours_earned_rate: Ratio of hours_earned/hours_attempted age: Age of student at start of program.
race: Racial classification of student. sex: Gender classification of student. high_school: Name of highschool student graduted from. Public High schools in Montgomery county are classified as MCPS. pell: Whether the student receives a pell grant or not.

3 Data Wrangling

3.1 Import Data

Summary of Data and Types

skim(df_Degrees)
Data summary
Name df_Degrees
Number of rows 7123
Number of columns 24
_______________________
Column type frequency:
character 15
logical 1
numeric 8
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
sex 0 1.00 1 1 0 4 0
race 0 1.00 5 22 0 9 0
age 0 1.00 4 7 0 5 0
high_school 0 1.00 7 30 0 163 0
full_part 0 1.00 2 2 0 2 0
city 19 1.00 5 19 0 127 0
stat_code 19 1.00 2 2 0 16 0
pell_grant 0 1.00 1 1 0 2 0
camp_code 140 0.98 1 1 0 6 0
major 0 1.00 1 61 0 34 0
pass_engl 0 1.00 1 1 0 2 0
pass_math 0 1.00 1 1 0 2 0
summer1 0 1.00 1 1 0 1 0
fall 0 1.00 1 1 0 1 0
HS_classify 0 1.00 2 14 0 7 0

Variable type: logical

skim_variable n_missing complete_rate mean count
MCPS 0 1 0.7 TRU: 4963, FAL: 2160

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
u_number 0 1 20196625.60 5027.06 20190001 20191872.50 20193733.00 20201703.5 20203588.0 ▇▃▁▂▇
zip 19 1 20886.64 1559.40 1460 20853.00 20877.00 20903.0 94025.0 ▁▇▁▁▁
hours_attempted 0 1 12.46 6.23 1 9.00 12.00 15.0 54.0 ▆▇▁▁▁
hours_earned 0 1 7.85 7.43 0 3.00 6.00 12.0 54.0 ▇▃▁▁▁
mc_gpa 0 1 2.19 1.47 0 0.67 2.50 3.5 4.0 ▆▂▃▅▇
term_year 0 1 2020.47 0.50 2020 2020.00 2020.00 2021.0 2021.0 ▇▁▁▁▇
hours_earned_rate 0 1 0.57 0.38 0 0.23 0.64 1.0 3.2 ▇▇▁▁▁
unearned_hours 0 1 4.61 4.24 -22 0.00 4.00 7.0 25.0 ▁▁▇▂▁

Change Datatypes

df_Degrees$u_number<- as.character(df_Degrees$u_number)
df_Degrees$term_year<- as.character(df_Degrees$term_year)

3.2 Create DataFrame of students who graduated MCPS high schools who are 20yrs and under .

Use the dataframe df_Degrees which has been cleaned in the initial data analysis. Filter all MCPS students who are 20yrs and younger in age.

df_MCPS20D<-df_Degrees %>%                    
         filter(HS_classify=="MCPS")%>%    # filter degrees dataset to obtain students who graduated MCPS highschools
         filter(age=='18 - 20' | age =="< 18") # filter students who are 20yrs old and younger. 

4 Demographics of Students who graduated from MCPS highschools and are 20yrs and younger.

4.1 Full time versus Part-time Degree Students

Frequency of Students Part time versus Full tim: 2020 vs 2021

# Number of students part time abnd full time  2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=full_part, fill=full_part)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=2,size=3)+
      facet_wrap(~term_year)+
      ggtitle("Number of Students Full time versus Part time")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

# change in overall MCPS student population from 2020 to 2021

df_MCPS20D%>%
          group_by(term_year,full_part)%>%
          count(full_part)%>%
          group_by(term_year)%>%
          mutate(total_pop =sum(n))%>%
          group_by(full_part)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 4 x 5
## # Groups:   full_part [2]
##   term_year full_part     n total_pop pct_change
##   <chr>     <chr>     <int>     <int>      <dbl>
## 1 2020      FT         1655      2456      NA   
## 2 2021      FT         1556      2303      -5.98
## 3 2020      PT          801      2456      NA   
## 4 2021      PT          747      2303      -6.74

There was a 5.98% decrease in full time students who graduated from MCPS highschools in term year 2021. There was a -6.74% decrease in part time students who graduated from MCPS.

4.2 Race

Count of Race Groups

ggplot(data=df_MCPS20D, aes(x=race, fill=race)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0,size=3)+
      facet_wrap(~term_year + full_part)+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())+
      ggtitle("Number of Students per a Race Group")+
      xlab("Race")+
      ylab("Frequency")

Full time student: Change in enrollment from 2020 to 2021 based on Race

# calculate percentage change in full time student enrollment from 2020 to 2021 by  race

df_MCPS20D%>%
          filter(full_part=="FT")%>%
          group_by(term_year,race)%>%
          count(race)%>%
          group_by(race)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 18 x 4
## # Groups:   race [9]
##    term_year race                       n pct_change
##    <chr>     <chr>                  <int>      <dbl>
##  1 2020      Am. Indian / AK Native     5      NA   
##  2 2021      Am. Indian / AK Native     1     -80   
##  3 2020      Asian                    272      NA   
##  4 2021      Asian                    227     -16.5 
##  5 2020      Black / African Am.      389      NA   
##  6 2021      Black / African Am.      326     -16.2 
##  7 2020      Foreign                  103      NA   
##  8 2021      Foreign                   96      -6.80
##  9 2020      Hawaiian / Pac. Isl.       5      NA   
## 10 2021      Hawaiian / Pac. Isl.       3     -40   
## 11 2020      Hispanic                 534      NA   
## 12 2021      Hispanic                 596      11.6 
## 13 2020      Multi-Race                71      NA   
## 14 2021      Multi-Race                63     -11.3 
## 15 2020      Unknown                   11      NA   
## 16 2021      Unknown                    3     -72.7 
## 17 2020      White                    265      NA   
## 18 2021      White                    241      -9.06

Full time students: There was a 16.5% decline in asian students, 16.1% decline in African American students, a 9.1% decline in white students and 6.8% decline in foreign students. Hispanic students increased by 11.6%.

Part time student: Change in enrollment from 2020 to 2021 based on Race

# calculate percentage change in full time student enrollment from 2020 to 2021 by  race

df_MCPS20D%>%
          filter(full_part=="PT")%>%
          group_by(term_year,race)%>%
          count(race)%>%
          group_by(race)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 18 x 4
## # Groups:   race [9]
##    term_year race                       n pct_change
##    <chr>     <chr>                  <int>      <dbl>
##  1 2020      Am. Indian / AK Native     4      NA   
##  2 2021      Am. Indian / AK Native     1     -75   
##  3 2020      Asian                     69      NA   
##  4 2021      Asian                     63      -8.70
##  5 2020      Black / African Am.      177      NA   
##  6 2021      Black / African Am.      181       2.26
##  7 2020      Foreign                   73      NA   
##  8 2021      Foreign                   54     -26.0 
##  9 2020      Hawaiian / Pac. Isl.       1      NA   
## 10 2021      Hawaiian / Pac. Isl.       1       0   
## 11 2020      Hispanic                 327      NA   
## 12 2021      Hispanic                 263     -19.6 
## 13 2020      Multi-Race                33      NA   
## 14 2021      Multi-Race                35       6.06
## 15 2020      Unknown                    5      NA   
## 16 2021      Unknown                    2     -60   
## 17 2020      White                    112      NA   
## 18 2021      White                    147      31.2

Part time students: There was an 8.7% decrease in Asian students, a 26% decrease in foreign students, 2.3% increase in african american students and a 19.6% decrease in hispanic students. There was a 31.25% increase in white students.

4.3 Gender

Gender of Students

# Gender of students part time and full time  2020 vs 2021
ggplot(data=df_MCPS20D, aes(x=sex, fill=sex)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=1,size=3)+
      facet_wrap(~term_year+full_part)+
      ggtitle("Gender of Students: Full time versus Part time")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

Calculate percentage change in full time student enrollment from 2020 to 2021 by gender

# calculate percentage change in full time student enrollment from 2020 to 2021 by  gender

df_MCPS20D%>%
          filter(full_part=="FT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,sex)%>%
          count(sex)%>%
          group_by(sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 4 x 4
## # Groups:   sex [2]
##   term_year sex       n pct_change
##   <chr>     <chr> <int>      <dbl>
## 1 2020      F       793      NA   
## 2 2021      F       819       3.28
## 3 2020      M       842      NA   
## 4 2021      M       719     -14.6

Full time students: 14% decrease in attendance by male students. A 3.27% decrease in female students.

Calculate percentage change in part time student enrollment from 2020 to 2021 by gender

# calculate percentage change in part time student enrollment from 2020 to 2021 by  gender

df_MCPS20D%>%
          filter(full_part=="PT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,sex)%>%
          count(sex)%>%
          group_by(sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 4 x 4
## # Groups:   sex [2]
##   term_year sex       n pct_change
##   <chr>     <chr> <int>      <dbl>
## 1 2020      F       381      NA   
## 2 2021      F       345      -9.45
## 3 2020      M       401      NA   
## 4 2021      M       395      -1.50

Part time: 9.5% decrease in female students. 1.5% decrease in male students.

Gender and Race breakdown of full time students

# Gender and Race of full time students  2020 vs 2021

df_MCPS20D%>%
      filter(sex %in% c("F","M"))%>%
      filter(full_part=="FT")%>%
      ggplot(., aes(x=race, fill=race)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, size=3)+
      facet_wrap(~term_year+sex)+
      ggtitle("Gender and Race of Full time Students")+
      ylab('Frequency')+
      xlab("")+
      theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

#    theme(axis.text.x=element_blank(),strip.background = element_blank(),panel.grid = element_blank())

Full time Student Enrollment Percentages trend by Gender and race

# calculate percentage change in student enrollment from 2020 to 2021 by race and gender

# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
          filter(full_part=="FT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,race,sex)%>%
          count(sex)%>%
          group_by(race,sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 35 x 5
## # Groups:   race, sex [18]
##    term_year race                   sex       n pct_change
##    <chr>     <chr>                  <chr> <int>      <dbl>
##  1 2020      Am. Indian / AK Native F         4      NA   
##  2 2020      Am. Indian / AK Native M         1      NA   
##  3 2021      Am. Indian / AK Native M         1       0   
##  4 2020      Asian                  F       111      NA   
##  5 2021      Asian                  F       115       3.60
##  6 2020      Asian                  M       159      NA   
##  7 2021      Asian                  M       110     -30.8 
##  8 2020      Black / African Am.    F       178      NA   
##  9 2021      Black / African Am.    F       169      -5.06
## 10 2020      Black / African Am.    M       202      NA   
## # … with 25 more rows

Part time Student Enrollment Percentages trend by Gender and race

# calculate percentage change in student enrollment from 2020 to 2021 by race and gender

# create data frames with counts of full time students by race and gender
df_MCPS20D%>%
          filter(full_part=="PT")%>%
          filter(sex=="F"|sex =="M")%>%
          group_by(term_year,race,sex)%>%
          count(sex)%>%
          group_by(race,sex)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100) 
## # A tibble: 31 x 5
## # Groups:   race, sex [17]
##    term_year race                   sex       n pct_change
##    <chr>     <chr>                  <chr> <int>      <dbl>
##  1 2020      Am. Indian / AK Native M         4      NA   
##  2 2021      Am. Indian / AK Native M         1     -75   
##  3 2020      Asian                  F        30      NA   
##  4 2021      Asian                  F        19     -36.7 
##  5 2020      Asian                  M        37      NA   
##  6 2021      Asian                  M        44      18.9 
##  7 2020      Black / African Am.    F        79      NA   
##  8 2021      Black / African Am.    F        84       6.33
##  9 2020      Black / African Am.    M        96      NA   
## 10 2021      Black / African Am.    M        94      -2.08
## # … with 21 more rows

4.4 Majors

Overall Majors trend

Count of Majors in Full time students in 2020

z1<- df_MCPS20D%>%
      filter(full_part=="FT" &term_year =="2020")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Full-time Students in 2020  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z1 + coord_flip()

Count of Majors in Full time students in 2021

z13<- df_MCPS20D%>%
      filter(full_part=="FT" &term_year =="2021")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Full-time Students in 2021  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z13 + coord_flip()

calculate percentage change in full time student majors from 2020 to 2021

df_MCPS20D%>%
          filter(full_part=="FT")%>%
          group_by(term_year,major)%>%
          count(major)%>%
          group_by(term_year)%>%
          group_by(major)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 62 x 4
## # Groups:   major [33]
##    term_year major                        n pct_change
##    <chr>     <chr>                    <int>      <dbl>
##  1 2020      0                            3      NA   
##  2 2021      0                            2     -33.3 
##  3 2020      American Sign Language       5      NA   
##  4 2021      American Sign Language       1     -80   
##  5 2020      Applied Geography            1      NA   
##  6 2021      Applied Geography            2     100   
##  7 2020      Architectural Technology    15      NA   
##  8 2021      Architectural Technology    19      26.7 
##  9 2020      Art                         24      NA   
## 10 2021      Art                         22      -8.33
## # … with 52 more rows

Count of Majors in Part time students in 2020

z11<- df_MCPS20D%>%
      filter(full_part=="PT" &term_year =="2020")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Part-time Students in 2020  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z11 + coord_flip()

Count of Majors in Part time students in 2021

z12<- df_MCPS20D%>%
      filter(full_part=="PT" &term_year =="2021")%>%
       ggplot(., aes(x=major, fill=major)) +
      geom_bar() +
      geom_text(stat='count', aes(label=..count..), vjust=0, hjust=0, size =3)+
      ggtitle("Majors of Part-time Students in 2021  ")+
      xlab("Major")+
      ylab("Frequency")+
    theme(legend.position = "none") 
       
z12 + coord_flip()

calculate percentage change in part time student majors from 2020 to 2021

df_MCPS20D%>%
          filter(full_part=="PT")%>%
          group_by(term_year,major)%>%
          count(major)%>%
          group_by(term_year)%>%
          group_by(major)%>%
          arrange(term_year,.by_group=TRUE)%>%
          mutate(pct_change= (n-lag(n))/lag(n)*100)
## # A tibble: 60 x 4
## # Groups:   major [32]
##    term_year major                        n pct_change
##    <chr>     <chr>                    <int>      <dbl>
##  1 2020      0                            5       NA  
##  2 2020      American Sign Language       1       NA  
##  3 2021      American Sign Language       2      100  
##  4 2020      Applied Geography            2       NA  
##  5 2020      Architectural Technology    13       NA  
##  6 2021      Architectural Technology     4      -69.2
##  7 2020      Art                         12       NA  
##  8 2021      Art                         14       16.7
##  9 2020      Broadcast Media              5       NA  
## 10 2021      Broadcast Media              4      -20  
## # … with 50 more rows

5 Statistical Analysis with Outliers

For the purposes of this analysis I will run the analysis first with outliers and then after removing outliers.

5.1 Hours Attempted

Boxplots of hours_attempted by year by MCPS students 20yrs and younger

p11 = ggplot(df_MCPS20D, aes(hours_attempted))
p11 + geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~full_part)

Students who register for more than 18 credits require special permission from the department. Further more a full time student is classified as someone who is enrolled in 12 or more credits. A part time student is classified as someone who is enrolled in less than 12 credits. However based on thge dataset, a number of full time students attempt less than 12 credits and large a number of part time students attempt more than 12 hours.

Boxplots of hours_attempted by year by Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_attempted))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of hours_attempted by year by Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_attempted))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.

Density plot of hours_attempted by year

ggplot(df_MCPS20D, aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~full_part)+
  xlab("Hours attempted") +
  ylab( "Density")+
   ggtitle(" Hours Attempted by Full-time Students vs Part-time Students")

Hours attempted by full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours attempted") +
  ylab( "Density") +
  ggtitle(" Hours Attempted by Full-time Students")

Fivenum Summary of Full time students

df_MCPS20D%>% filter(full_part=="FT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_attempted)[1],
            Q1 = fivenum(hours_attempted)[2],
            median = fivenum(hours_attempted)[3],
            Q3 = fivenum(hours_attempted)[4],
            max = fivenum(hours_attempted)[5],
            mean= mean(hours_attempted),
            sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race               term_year     n   min    Q1 median    Q3   max  mean    sd
##    <chr>              <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Am. Indian / AK N… 2020          5     6  13       17  19      36  18.2 11.1 
##  2 Am. Indian / AK N… 2021          1    13  13       13  13      13  13   NA   
##  3 Asian              2020        272     6  13       15  20      52  17.7  8.13
##  4 Asian              2021        227     7  13       15  17      46  16.8  7.00
##  5 Black / African A… 2020        389     5  12       13  14      42  13.7  3.53
##  6 Black / African A… 2021        326     4  12       14  16      38  14.9  4.31
##  7 Foreign            2020        103     7  12       14  17      31  14.8  4.26
##  8 Foreign            2021         96     7  12.5     15  16      37  15.7  5.29
##  9 Hawaiian / Pac. I… 2020          5     9  12       13  13      15  12.4  2.19
## 10 Hawaiian / Pac. I… 2021          3    12  15.5     19  24.5    30  20.3  9.07
## 11 Hispanic           2020        534     4  12       13  15      39  14.2  4.43
## 12 Hispanic           2021        596     3  12       14  16      43  15.0  4.33
## 13 Multi-Race         2020         71     6  12       13  17      44  16.7  8.04
## 14 Multi-Race         2021         63     6  12       14  16.5    43  15.7  6.22
## 15 Unknown            2020         11     9  12       14  15      31  15    5.78
## 16 Unknown            2021          3    12  12       12  13      14  12.7  1.15
## 17 White              2020        265     8  12       13  16      46  15.9  7.08
## 18 White              2021        241     7  13       14  17      54  16.5  6.37

Hours attempted by part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_attempted, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours attempted") +
  ylab( "Density")+
   ggtitle(" Hours Attempted by Part-time Students")

Fivenum Summary of Part time students

df_MCPS20D%>% filter(full_part=="PT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_attempted)[1],
            Q1 = fivenum(hours_attempted)[2],
            median = fivenum(hours_attempted)[3],
            Q3 = fivenum(hours_attempted)[4],
            max = fivenum(hours_attempted)[5],
            mean= mean(hours_attempted),
            sd = sd(hours_attempted))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race               term_year     n   min    Q1 median    Q3   max  mean    sd
##    <chr>              <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Am. Indian / AK N… 2020          4     3   3      5.5   8.5     9  5.75  3.20
##  2 Am. Indian / AK N… 2021          1     6   6      6     6       6  6    NA   
##  3 Asian              2020         69     2   6      9    10      33  8.62  4.94
##  4 Asian              2021         63     3   7.5    9    11      21  8.90  3.64
##  5 Black / African A… 2020        177     1   6      7     9      15  7.28  2.62
##  6 Black / African A… 2021        181     1   6      8    10      25  7.80  3.37
##  7 Foreign            2020         73     3   6      8    10      23  8.18  3.89
##  8 Foreign            2021         54     3   5      9    10      29  8.61  4.38
##  9 Hawaiian / Pac. I… 2020          1     6   6      6     6       6  6    NA   
## 10 Hawaiian / Pac. I… 2021          1     5   5      5     5       5  5    NA   
## 11 Hispanic           2020        327     1   6      8     9      21  7.84  3.06
## 12 Hispanic           2021        263     1   6      8    11      42  8.73  4.41
## 13 Multi-Race         2020         33     1   4      8     9      12  7.03  2.98
## 14 Multi-Race         2021         35     3   6      9    10      26  8.34  3.90
## 15 Unknown            2020          5     7   9     10    10      10  9.2   1.30
## 16 Unknown            2021          2     4   4      6.5   9       9  6.5   3.54
## 17 White              2020        112     1   6      8    10      33  8.15  4.47
## 18 White              2021        147     3   5      8    10      39  8.43  5.02

5.2 Hours Earned

Boxplots of Hours Earned by year by MCPS students 20yrs and younger

p11 = ggplot(df_MCPS20D, aes(hours_earned))
p11 + geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~full_part)

Boxplots of hours_earned by year by Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of hours_earned by year by Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

There are not many outliers in the part time student groups. Term year 2021 seems to have more outliers on the upper end.

Density plot of hours_earned by year

ggplot(df_MCPS20D, aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~full_part)+
  xlab("Hours Earned") +
  ylab( "Density")+
  ggtitle(" Hours Earned by Full-time vs Part-time Students")

Hours_earned by full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours Earned") +
  ylab( "Density")+
   ggtitle(" Hours Earned by Full-time Students")

Fivenum Summary of Full time students

df_MCPS20D%>% filter(full_part=="FT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_earned)[1],
            Q1 = fivenum(hours_earned)[2],
            median = fivenum(hours_earned)[3],
            Q3 = fivenum(hours_earned)[4],
            max = fivenum(hours_earned)[5],
            mean= mean(hours_earned),
            sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race               term_year     n   min    Q1 median    Q3   max  mean    sd
##    <chr>              <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
##  1 Am. Indian / AK N… 2020          5     0  10       14  19      36 15.8  13.3 
##  2 Am. Indian / AK N… 2021          1    13  13       13  13      13 13    NA   
##  3 Asian              2020        272     0   9       13  17      52 14.7   9.15
##  4 Asian              2021        227     0   9       12  16      46 13.4   8.53
##  5 Black / African A… 2020        389     0   6        9  12      42  8.85  5.58
##  6 Black / African A… 2021        326     0   6        9  13      37  9.55  6.53
##  7 Foreign            2020        103     0   6        9  13      31 10.4   6.44
##  8 Foreign            2021         96     0   6       10  13      37 10.6   7.56
##  9 Hawaiian / Pac. I… 2020          5     0   0        9  12      13  6.8   6.38
## 10 Hawaiian / Pac. I… 2021          3     9  12.5     16  23      30 18.3  10.7 
## 11 Hispanic           2020        534     0   6        9  12      38  9.57  6.50
## 12 Hispanic           2021        596     0   6       10  13      33  9.73  6.36
## 13 Multi-Race         2020         71     0   7       12  15      44 13.4   9.76
## 14 Multi-Race         2021         63     0   6       10  13.5    43 10.8   8.69
## 15 Unknown            2020         11     3   5        9  13      31 10.5   7.90
## 16 Unknown            2021          3     3   5        7   9.5    12  7.33  4.51
## 17 White              2020        265     0   7       11  15      46 12.3   8.88
## 18 White              2021        241     0   7       12  15      54 12.5   8.22

hours_earned by part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours Earned") +
  ylab( "Density")+
   ggtitle(" Hours Earned by Part-time Students")

Fivenum Summary of Part time students

df_MCPS20D%>% filter(full_part=="PT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(hours_earned)[1],
            Q1 = fivenum(hours_earned)[2],
            median = fivenum(hours_earned)[3],
            Q3 = fivenum(hours_earned)[4],
            max = fivenum(hours_earned)[5],
            mean= mean(hours_earned),
            sd = sd(hours_earned))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race              term_year     n   min    Q1 median    Q3   max  mean     sd
##    <chr>             <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 Am. Indian / AK … 2020          4     0   1.5    3       3     3  2.25  1.5  
##  2 Am. Indian / AK … 2021          1     3   3      3       3     3  3    NA    
##  3 Asian             2020         69     0   0      4       6    33  4.81  5.50 
##  4 Asian             2021         63     0   3      3       6    21  4.73  4.48 
##  5 Black / African … 2020        177     0   0      3       4    11  2.73  2.82 
##  6 Black / African … 2021        181     0   0      1       6    22  2.81  3.72 
##  7 Foreign           2020         73     0   0      3       6    21  3.96  4.60 
##  8 Foreign           2021         54     0   0      0       6    29  3.09  5.04 
##  9 Hawaiian / Pac. … 2020          1     0   0      0       0     0  0    NA    
## 10 Hawaiian / Pac. … 2021          1     3   3      3       3     3  3    NA    
## 11 Hispanic          2020        327     0   0      3       6    21  3.48  3.84 
## 12 Hispanic          2021        263     0   0      3       6    42  4.65  5.14 
## 13 Multi-Race        2020         33     0   1      3       9    11  4.27  3.83 
## 14 Multi-Race        2021         35     0   0      3       6    26  4.11  4.95 
## 15 Unknown           2020          5     0   1      1       4     9  3     3.67 
## 16 Unknown           2021          2     3   3      3.5     4     4  3.5   0.707
## 17 White             2020        112     0   0      4       7    27  4.74  4.87 
## 18 White             2021        147     0   3      4       7    33  5.16  5.14

5.3 GPA

Boxplots of GPA by year by MCPS students 20yrs and younger

p11 = ggplot(df_MCPS20D, aes(mc_gpa))
p11 + geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~full_part)

Boxplots of GPA by year by Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(mc_gpa))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of GPA by year by Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(mc_gpa))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Density plot of GPA by year

ggplot(df_MCPS20D, aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~full_part)+
  xlab("GPA") +
  ylab( "Density")+
  ggtitle(" GPA by Full-time vs Part-time Students")

GPA by full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("GPA") +
  ylab( "Density")+
   ggtitle(" GPA of Full-time Students")

Fivenum Summary of Full time students

df_MCPS20D%>% filter(full_part=="FT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(mc_gpa)[1],
            Q1 = fivenum(mc_gpa)[2],
            median = fivenum(mc_gpa)[3],
            Q3 = fivenum(mc_gpa)[4],
            max = fivenum(mc_gpa)[5],
            mean= mean(mc_gpa),
            sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race              term_year     n   min    Q1 median    Q3   max  mean     sd
##    <chr>             <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 Am. Indian / AK … 2020          5  0     2.35   2.9   3.5   4     2.55  1.55 
##  2 Am. Indian / AK … 2021          1  2.77  2.77   2.77  2.77  2.77  2.77 NA    
##  3 Asian             2020        272  0     2.33   3.3   3.73  4     2.93  1.03 
##  4 Asian             2021        227  0     2.5    3.23  3.71  4     2.88  1.12 
##  5 Black / African … 2020        389  0     1.5    2.5   3.14  4     2.25  1.18 
##  6 Black / African … 2021        326  0     1.33   2.67  3.4   4     2.31  1.30 
##  7 Foreign           2020        103  0     2      3     3.65  4     2.71  1.20 
##  8 Foreign           2021         96  0     1.46   2.82  3.69  4     2.48  1.35 
##  9 Hawaiian / Pac. … 2020          5  0     0      2.25  2.67  3.77  1.74  1.68 
## 10 Hawaiian / Pac. … 2021          3  1.75  2.22   2.68  3.34  4     2.81  1.13 
## 11 Hispanic          2020        534  0     1.5    2.70  3.44  4     2.38  1.25 
## 12 Hispanic          2021        596  0     1.23   2.66  3.33  4     2.29  1.30 
## 13 Multi-Race        2020         71  0     2      2.75  3.5   4     2.59  1.13 
## 14 Multi-Race        2021         63  0     1.5    2.6   3.54  4     2.37  1.35 
## 15 Unknown           2020         11  0.33  2.12   2.33  3.32  4     2.55  1.00 
## 16 Unknown           2021          3  2.55  2.65   2.75  3.38  4     3.1   0.786
## 17 White             2020        265  0     1.8    3     3.6   4     2.59  1.22 
## 18 White             2021        241  0     2      3     3.69  4     2.67  1.26

GPA of Part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(mc_gpa, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("Hours Earned") +
  ylab( "Density")+
   ggtitle(" GPA of Part-time Students")

Fivenum Summary of Part time students

df_MCPS20D%>% filter(full_part=="PT")%>%
  group_by(race,term_year)%>%
  summarise(n = n(),
            min = fivenum(mc_gpa)[1],
            Q1 = fivenum(mc_gpa)[2],
            median = fivenum(mc_gpa)[3],
            Q3 = fivenum(mc_gpa)[4],
            max = fivenum(mc_gpa)[5],
            mean= mean(mc_gpa),
            sd = sd(mc_gpa))
## `summarise()` has grouped output by 'race'. You can override using the `.groups` argument.
## # A tibble: 18 x 10
## # Groups:   race [9]
##    race              term_year     n   min    Q1 median    Q3   max  mean     sd
##    <chr>             <chr>     <int> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>
##  1 Am. Indian / AK … 2020          4     0  0.5    1.25  2.25     3  1.38  1.25 
##  2 Am. Indian / AK … 2021          1     2  2      2     2        2  2    NA    
##  3 Asian             2020         69     0  0      2.3   3.33     4  2.01  1.54 
##  4 Asian             2021         63     0  0.8    2     3.28     4  1.94  1.48 
##  5 Black / African … 2020        177     0  0      1.33  2.71     4  1.46  1.38 
##  6 Black / African … 2021        181     0  0      0.33  2.33     4  1.13  1.32 
##  7 Foreign           2020         73     0  0      2     3        4  1.65  1.51 
##  8 Foreign           2021         54     0  0      0     2.67     4  1.20  1.46 
##  9 Hawaiian / Pac. … 2020          1     0  0      0     0        0  0    NA    
## 10 Hawaiian / Pac. … 2021          1     4  4      4     4        4  4    NA    
## 11 Hispanic          2020        327     0  0      1.5   3        4  1.60  1.50 
## 12 Hispanic          2021        263     0  0      2     3        4  1.73  1.43 
## 13 Multi-Race        2020         33     0  0.67   2     3.5      4  1.99  1.51 
## 14 Multi-Race        2021         35     0  0      2.5   3        4  1.80  1.55 
## 15 Unknown           2020          5     0  0.75   2     3.67     4  2.08  1.75 
## 16 Unknown           2021          2     3  3      3.5   4        4  3.5   0.707
## 17 White             2020        112     0  0      2     3.33     4  1.86  1.54 
## 18 White             2021        147     0  0.55   2.5   3.33     4  2.16  1.49

## Hours Earned Rate

Density plot of Hours Earned Rate by year

ggplot(df_MCPS20D, aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.3) +
  facet_wrap(~full_part)+
  xlab("Hours Earned Rate") +
  ylab( "Density")+
  xlim(0,1)

Boxplots of Hours Earned Rate of Full time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned_rate))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Boxplots of Hours Earned Rate of Part time MCPS students 20yrs and younger

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
   ggplot(., aes(hours_earned_rate))+
   geom_boxplot(aes(colour = term_year)) +
       facet_wrap(~race)

Hours Earned Rate of full time students

df_MCPS20D%>%filter(full_part=="FT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("GPA") +
  ylab( "Density")+
   ggtitle(" Hours Earned Rate of Full-time Students")

Hours Earned Rate of part time students

df_MCPS20D%>%filter(full_part=="PT")%>%
  filter(race=="White" |race=="Asian" |race=="Hispanic" |race=="Black / African Am." )%>%
  ggplot(., aes(hours_earned_rate, fill = term_year)) + geom_density(alpha = 0.2) +
  facet_wrap(~race)+
  xlab("GPA") +
  ylab( "Density")+
   ggtitle(" Hours Earned Rate of Part-time Students")